Search CORE

Online Research Database In Technology

MPG.PuRe

FigShare

Quantitative sequence-function relationships in proteins based on gene ontology

Author: A Bairoch
A Bairoch
A Bateman
A Bateman
A Conesa
AE Todd
Arthur M Lesk
CA Wilson
CZ Cai
D Devos
D Devos
Daniel J Blankenberg
E Camon
EL Sonnhammer
J Piatigorsky
JA Gerlt
JA Ranea
JC Whisstock
K Fleming
L Holm
LB Koski
LJ Jensen
M Ashburner
M Shadidy
MA Andrade
MD Ganfornina
N Hulo
Naomi Altman
P Bork
R Karp
RA Laskowski
RA Laskowski
RC Edgar
S Jones
S Nakayama
SB Needleman
SE Brenner
SF Altschul
SR Eddy
SS Jeong
T Doerks
TF Smith
TK Attwood
Vineet Sangar
X Lu
Publication venue: BioMed Central
Publication date: 01/08/2007
Field of study

Abstract Background The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function – the basis of transfer of annotations in databases – must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins – for instance, orthologs in different mammals – to very distantly-related proteins at the limit of reliable recognition of homology. Results We correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins have multiple functions. Our results show that, among homologous proteins, the proportion of divergent functions decreases dramatically above a threshold of sequence similarity at about 50% residue identity. For proteins with more than 50% residue identity, transfer of annotation between homologs will lead to an erroneous attribution with a totally dissimilar function in fewer than 6% of cases. This means that for very similar proteins (about 50 % identical residues) the chance of completely incorrect annotation is low; however, because of the phenomenon of recruitment, it is still non-zero. Conclusion Our results describe general features of the evolution of protein function, and serve as a guide to the reliability of annotation transfer, based on the closeness of the relationship between a new protein and its nearest annotated relative.</p

Bacterial Genomes: Habitat Specificity and Uncharted Organisms

Author: A Bernal
C Pedrós-Alió
D Wu
EA Dinsdale
FE Angly
Fernando Dini Andreote
Francisco Dini-Andreote
GR Burke
H Toh
J Raes
JA Gilbert
Jack T. Trevors
JAG Ranea
Jan Dirk van Elsas
JE Barrick
JK Harris
JT Trevors
L Oksana
L Philippot
M Touchon
M Wagner
ML Sogin
NR Pace
P Lapierre
P Yilmaz
PKH Lee
RT Jones
S Abby
SG Tringe
T Ishoey
T Woyke
TM Vogel
Welington Luiz Araújo
Publication venue: Springer-Verlag
Publication date: 01/01/2012
Field of study

The capability and speed in generating genomic data have increased profoundly since the release of the draft human genome in 2000. Additionally, sequencing costs have continued to plummet as the next generation of highly efficient sequencing technologies (next-generation sequencing) became available and commercial facilities promote market competition. However, new challenges have emerged as researchers attempt to efficiently process the massive amounts of sequence data being generated. First, the described genome sequences are unequally distributed among the branches of bacterial life and, second, bacterial pan-genomes are often not considered when setting aims for sequencing projects. Here, we propose that scientists should be concerned with attaining an improved equal representation of most of the bacterial tree of life organisms, at the genomic level. Moreover, they should take into account the natural variation that is often observed within bacterial species and the role of the often changing surrounding environment and natural selection pressures, which is central to bacterial speciation and genome evolution. Not only will such efforts contribute to our overall understanding of the microbial diversity extant in ecosystems as well as the structuring of the extant genomes, but they will also facilitate the development of better methods for (meta)genome annotation

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Finding the “Dark Matter” in Human and Yeast Protein Network Prediction and Modelling

Author: A Ben-Hur
A Birnbaum
A Chatr-aryamontri
A Fard-Karimpour
A Ruepp
Adam J. Reid
AL Barabasi
Andrew B. Clegg
Andrey Rzhetsky
B Lehner
B Linghu
B Snel
CE Shannon
ChJ Needham
Christine Orengo
Corin Yeats
D Hwang
E Eden
E Ravasz
E Ravasz
F Pazos
Francisca Sanchez-Jimenez
GJ Dennis
H Pearson
H Yu
Ian Morilla
JA Ranea
JF Rual
JH Halton
Jon G. Lees
JS Mattick
Juan A. G. Ranea
KR Brown
L Matthews
LH Greene
LJ Jensen
LJ Lu
M Ashburner
M Brinkmeier
M Kanehisa
ME Cusick
MEJ Newman
MEJ Newman
MEJ Newman
N Metropolis
PM Bowers
PW Lord
PW Lord
R Albert
R Massey
RB Russell
RD Finn
S Kerrien
S Mika
S Peri
S Suthram
S Yellaboina
SP Colgan
TJ van Dam
VC Raykar
WF Bauer
WS Noble
Publication venue: Public Library of Science
Publication date: 01/09/2010
Field of study

Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-computational predictions. However, we will not achieve realistic modelling of complex molecular systems if the current experimental designs lead to biased screenings of real protein networks and leave large, functionally important areas poorly characterised. To assess the likelihood of this, we have built comprehensive network models of the yeast and human proteomes by using a meta-statistical integration of diverse computationally predicted protein association datasets. We have compared these predicted networks against combined experimental datasets from seven biological resources at different level of statistical significance. These eukaryotic predicted networks resemble all the topological and noise features of the experimentally inferred networks in both species, and we also show that this observation is not due to random behaviour. In addition, the topology of the predicted networks contains information on true protein associations, beyond the constitutive first order binary predictions. We also observe that most of the reliable predicted protein associations are experimentally uncharacterised in our models, constituting the hidden or “dark matter” of networks by analogy to astronomical systems. Some of this dark matter shows enrichment of particular functions and contains key functional elements of protein networks, such as hubs associated with important functional areas like the regulation of Ras protein signal transduction in human cells. Thus, characterising this large and functionally important dark matter, elusive to established experimental designs, may be crucial for modelling biological systems. In any case, these predictions provide a valuable guide to these experimentally elusive regions

Public Library of Science (PLOS)

Ancient horizontal gene transfer and the last common ancestors

Author: A Barzel
A Burt
A Hua-Van
A Sauerwald
A Stoltzfus
AM Barbaglia
B Boussau
B Reisinger
C Darwin
Cheryl P Andam
CP Andam
CP Andam
CR Woese
CR Woese
D Darriba
D Williams
DA Benson
DH Rothman
EGJ Danchin
EV Koonin
EV Koonin
EV Koonin
G Borrel
G Borrel
G Eriani
G Srinivasan
GJ Szöllosi
GM Nagel
GP Fournier
GP Fournier
GP Fournier
Gregory P Fournier
H Grosjean
J Peretó
J Thomas
JA Krzycki
JA Krzycki
JAG Ranea
JL Siefert
JM Kavran
Johann Peter Gogarten
JP Gogarten
K Swithers
K Vetsigian
L Olendzenski
L Ribas de Pouplana
L Ribas de Pouplana
L Sinzelle
LS Frost
M Ibba
M Khomyakova
M Syvanen
M Wu
MH Mazauric
MV Omelchenko
N Lartillot
N Nameki
O Penn
O Zhaxybayeva
P Kück
P Kück
P O’Donoghue
P O’Donoghue
P Schimmel
R Dawkins
R Jain
RC Edgar
S Bilokapic
S Gould
S Guindon
S Herring
S Herring
S Morris
S Osawa
SQ Le
T Tuller
TJ Treangen
VV Kapitonov
WM Fitch
Y Diaz-Lazcoz
Y Zhang
YI Wolf
Z-P Fang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/04/2015
Field of study

Background The genomic history of prokaryotic organismal lineages is marked by extensive horizontal gene transfer (HGT) between groups of organisms at all taxonomic levels. These HGT events have played an essential role in the origin and distribution of biological innovations. Analyses of ancient gene families show that HGT existed in the distant past, even at the time of the organismal last universal common ancestor (LUCA). Most gene transfers originated in lineages that have since gone extinct. Therefore, one cannot assume that the last common ancestors of each gene were all present in the same cell representing the cellular ancestor of all extant life. Results Organisms existing as part of a diverse ecosystem at the time of LUCA likely shared genetic material between lineages. If these other lineages persisted for some time, HGT with the descendants of LUCA could have continued into the bacterial and archaeal lineages. Phylogenetic analyses of aminoacyl-tRNA synthetase protein families support the hypothesis that the molecular common ancestors of the most ancient gene families did not all coincide in space and time. This is most apparent in the evolutionary histories of seryl-tRNA synthetase and threonyl-tRNA synthetase protein families, each containing highly divergent “rare” forms, as well as the sparse phylogenetic distributions of pyrrolysyl-tRNA synthetase, and the bacterial heterodimeric form of glycyl-tRNA synthetase. These topologies and phyletic distributions are consistent with horizontal transfers from ancient, likely extinct branches of the tree of life. Conclusions Of all the organisms that may have existed at the time of LUCA, by definition only one lineage is survived by known progeny; however, this lineage retains a genomic record of heterogeneous genetic origins. The evolutionary histories of aminoacyl-tRNA synthetases (aaRS) are especially informative in detecting this signal, as they perform primordial biological functions, have undergone several ancient HGT events, and contain many sites with low substitution rates allowing deep phylogenetic reconstruction. We conclude that some aaRS families contain groups that diverge before LUCA. We propose that these ancient gene variants be described by the term “hypnologs”, reflecting their ancient, reticulate origin from a time in life history that has been all but erased”.National Science Foundation (U.S.) (Grant DEB 0830024)Exobiology Program (U.S.) (Grant NNX10AR85G)United States. National Aeronautics and Space Administration (Postdoctoral Program

DSpace@MIT

Conserved synteny at the protein family level reveals genes underlying Shewanella species’ cold tolerance and predicts their novel phenotypes

Author: A Ferrandez
A Polissi
A Vezzi
AJ Auman
AJ Enright
AL Delcher
Anna Y. Obraztsova
BA Methe
Byung H. Park
C Vieille
CA Orengo
CC Hase
CV Susana
CW Saltikov
D Magnani
Denise D. Schmoyer
DN Wilson
E Lander
Edward C. Uberbacher
EJ Nelson
FA Armstrong
G Apic
G Meshulam-Simon
Guruprasad H. Kora
J Kawamoto
J Mrazek
J Nogales
JA Gralnick
JA Ranea
Jim K. Fredrickson
JK Fredrickson
JO Korbel
JP Gogarten
JP Gogarten
JR Roth
K Chourey
K Kogure
K Nakaminami
K Venkateswaran
Kenneth H. Nealson
KS Makarova
M Long
M Long
M Madera
MA Larkin
Margaret F. Romine
Margrethe H. Serres
MF Romine
Miriam L. Land
MJ Maher
MK Chattopadhyay
P Dibrov
R Ihaka
S Brohee
S Phadtare
SF Altschul
SR Brinsmade
SV Date
T Hiramatsu
TA Bobik
Tatiana V. Karpinets
TD Lawley
Terence B. Kothe
TL Kieft
UG Erdal
WN Konings
X Qiu
Y Liu
Yanbing Wang
YF Lin
Z Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

© The Authors 2009. This article is distributed under the terms of the Creative Commons Attribution Noncommercial License. The definitive version was published in Functional & Integrative Genomics 10 (2010): 97-110, doi:10.1007/s10142-009-0142-y.Bacteria of the genus Shewanella can thrive in different environments and demonstrate significant variability in their metabolic and ecophysiological capabilities including cold and salt tolerance. Genomic characteristics underlying this variability across species are largely unknown. In this study, we address the problem by a comparison of the physiological, metabolic, and genomic characteristics of 19 sequenced Shewanella species. We have employed two novel approaches based on association of a phenotypic trait with the number of the trait-specific protein families (Pfam domains) and on the conservation of synteny (order in the genome) of the trait-related genes. Our first approach is top-down and involves experimental evaluation and quantification of the species’ cold tolerance followed by identification of the correlated Pfam domains and genes with a conserved synteny. The second, a bottom-up approach, predicts novel phenotypes of the species by calculating profiles of each Pfam domain among their genomes and following pair-wise correlation of the profiles and their network clustering. Using the first approach, we find a link between cold and salt tolerance of the species and the presence in the genome of a Na+/H+ antiporter gene cluster. Other cold-tolerance-related genes include peptidases, chemotaxis sensory transducer proteins, a cysteine exporter, and helicases. Using the bottom-up approach, we found several novel phenotypes in the newly sequenced Shewanella species, including degradation of aromatic compounds by an aerobic hybrid pathway in Shewanella woodyi, degradation of ethanolamine by Shewanella benthica, and propanediol degradation by Shewanella putrefaciens CN32 and Shewanella sp. W3-18-1.This research was supported by the U.S. Department of Energy (DOE) Office of Biological and Environmental Research under the Genomics: GTL Program via the Shewanella Federation consortium

Woods Hole Open Access Server

Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint

Author: A Andreeva
A Bateman
A Elofsson
A Lupas
A McPherson
A Sali
AE Todd
AE Todd
AJ Enright
B Rost
C Sander
C Vogel
CA Orengo
CH Wu
Christine A Orengo
D Baker
D Busso
D Vitkup
DT Jones
DT Jones
FMG Pearl
GA Reeves
I Letunic
IV Grigoriev
J Liu
J Liu
J Park
J Thornton
J Westbrook
JA Ranea
JC Norvell
JC Wootton
JD Watson
JM Chandonia
JM Chandonia
JM Chandonia
K Karplus
KT Simons
M Linial
M Skovgaard
N Siew
PJ Kersey
R Sanchez
RA Laskowski
RC Stevens
RC Stevens
RI Sadreyev
RL Marsden
Russell L Marsden
SA Lesley
SE Brenner
SE Brenner
SH Kim
SK Burley
SK Burley
SR Eddy
TC Terwilliger
Tony A Lewis
W Minor
W Tian
Y Kim
Y Yan
Publication venue: BioMed Central
Publication date: 01/03/2007
Field of study

BACKGROUND: Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families. RESULTS: In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterised families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterised domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies. CONCLUSION: This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution